Skip to content

Use cluster endpoint at startup#90

Draft
piceri wants to merge 8 commits into
mainfrom
piceri/startup-cluster-endpoint
Draft

Use cluster endpoint at startup#90
piceri wants to merge 8 commits into
mainfrom
piceri/startup-cluster-endpoint

Conversation

@piceri
Copy link
Copy Markdown
Contributor

@piceri piceri commented May 29, 2026

This change allows deployment tracker to use the cluster endpoint at startup to send the current state of the cluster. This will reduce the load cause at startup when deployment tracker sends the state of the cluster one container at a time.

Change details:

  • While the informers sync, events are no longer added to the work queue
  • Once the informers have synced, use the pod informer to get the current list of running pods
  • Any new events are then added the the work queue
  • Current pod list is processed and deduped by deployment name + digest
  • Send the list to the cluster endpoint
    • If this fails, deployment tracker does not continue to run
  • Use the response to fill observed and unknown caches

piceri added 8 commits May 22, 2026 14:44
Signed-off-by: Eric Pickard <piceri@github.com>
Signed-off-by: Eric Pickard <piceri@github.com>
Signed-off-by: Eric Pickard <piceri@github.com>
Signed-off-by: Eric Pickard <piceri@github.com>
Signed-off-by: Eric Pickard <piceri@github.com>
Signed-off-by: Eric Pickard <piceri@github.com>
Signed-off-by: Eric Pickard <piceri@github.com>
Signed-off-by: Eric Pickard <piceri@github.com>
Copy link
Copy Markdown
Contributor

@ajbeattie ajbeattie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 💯, looking great on initial pass. Still working through some of the changes but wanted to go ahead and raise the Job/CronJob suggestion ⬇️

// informerSyncTimeout is the maximum time allowed for all informers to sync
// and prevents sync from hanging indefinitely.
informerSyncTimeout time.Duration
syncing atomic.Bool
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prob worth adding a comment here and a few in New/Run below explaining what syncing means/does.

continue
}

if pod.Status.Phase != corev1.PodRunning || !workload.HasSupportedOwner(pod) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here we'll also need to accept terminal-state jobs like in the AddFunc, i.e. (workload.IsTerminalPhase(pod) && workload.GetJobOwnerName(pod) != "") so that we don't miss any jobs that run and complete while the sync is happening aren't dropped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants